Thresher: determining the number of clusters while removing outliers
نویسندگان
چکیده
منابع مشابه
Automatically Determining Number of Clusters
Automatically determining number of clusters in the data is an unsolved/unexplored problem. First I'll show why we need to do this, and whether this is a reasonable problem in text clustering in particular. Then starting from simple 1-d/2-d study, I nd BIC (Bayesian Information Criterion) is a useful measure, which by penalizing model tness by model complexity, usually tells the right number of...
متن کاملRemoving Outliers in Illumination Estimation
A method of outlier detection is proposed as a way of improving illumination-estimation performance in general, and for scenes with multiple sources of illumination in particular. Based on random sample consensus (RANSAC), the proposed method (i) makes estimates of the illumination chromaticity from multiple, randomly sampled sub-images of the input image; (ii) fits a model to the estimates; (i...
متن کاملProbabilistic estimation while ignoring outliers
Logistic regression learns a parameterized mapping from feature vectors to probability vectors and is for example central to estimating click rates for ads on web pages. The parameter is found by minimizing the logistic loss. However minimizing any convex loss summed over a set of examples is prone to outliers. We define a versatile method for designing non-convex losses that ameliorate the eff...
متن کاملDetermining the Number of Trace Clusters: a Stability-based Approach
Given the complexity of real-life event logs, several trace clustering techniques have been proposed to partition an event log into subsets with a lower degree of variation. In general, these techniques assume that the number of clusters is known in advance. However, this will rarely be the case in practice. Therefore, this paper is the first to present an approach to determine the appropriate ...
متن کاملDetermining the Number of Clusters via Iterative Consensus Clustering
We use a cluster ensemble to determine the number of clusters, k, in a group of data. A consensus similarity matrix is formed from the ensemble using multiple algorithms and several values for k. A random walk is induced on the graph defined by the consensus matrix and the eigenvalues of the associated transition probability matrix are used to determine the number of clusters. For noisy or high...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2018
ISSN: 1471-2105
DOI: 10.1186/s12859-017-1998-9